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Table 1: (2+1) flavor dynamical DWF ensembles generated by RBC and UKQCD collaborations, fillip] — 
m {l,s} + wires- The first 2 ensembles with acceptance in boldface are generated with a different variant of 
Rational hybrid Monte Carlo (RHMC[|l[]) (RHMC I in [|]). t(MD) denotes the total trajectory length in MD 
units and the numbers with "+" denotes ongoing productions. 

1. Introduction 

Continuing theoretical advances in lattice gauge theory, especially in chiral fermion formu- 
lations and fermion simulation algorithms, and increasing computational resources are making 
systematic continuum extrapolation of many QCD quantities without uncontrolled systematic error 
a reality. RBC and UKQCD collaborations have generated dynamical 2+1 flavor Domain Wall 
Fermion (DWF) ensembles with a -1 ~ 1.7Gev [g, ^[ |J, which has allowed extrapolation in quark 
mass and lattice volume. Table [l| is a list of existing 2+ 1 flavor DWF ensembles. 

Gauge ensembles with a smaller spacing is the obvious next step in making continuum ex- 
trapolations more systematic. To this end, RBC and UKQCD collaborations started generating 
j8 = 2.25, 32 3 x 64 x 16 dynamical DWF configurations with 2 different light quark masses. Re- 
cently, LHPC collaboration joined this effort and now part of the ensembles are being generated 
with the joint allocation on DOE QCDOC at Brookhaven Notional Laboratory as a result. We are 
aiming at a -1 ~ 2.2Gev, mpsL > 4, which will allow us to get the statistical and systematic errors 
down to a few percent level for the lattice studies of quantities such as weak matrix elements and 
hadron matrix elements. 

A detailed description of simulation algorithm and performance is given in section |2| and basic 
quantities and preliminary mass measurements on m/ = 0.004 ensemble are presented in section ||. 

2. Simulation details 

As described in [|2|, |3], ||, Q], we use the combination of the DWF formulation from Furman 
and Shamir [||] and Iwasaki gauge action, which is shown to suppress lattice dislocations enough 
to give DWF good chiral symmetry while allowing for enough topology tunneling for the range of 
lattice spacings we are interested in. 

The simulation of 2 light and 1 strange quarks is actually done as a combination of (1+1+1) 
flavor of strange quarks, done with rational quotient approximation, and 2 flavors of light quark 
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preconditioned by the strange quark[|7|]. While the preconditioning mass does not have to be the 
same as the strange quark, we found the strange quark is close to be optimal as the preconditioning 
mass in DWF simulations on smaller volumes. Using 3>(mf) = D^ DWF (Ms : m.f)Doy/f(M^ : mf) 
where M$ is the domain wall height, fixed at 1.8, and mj is the DWF mass term, the fermion 
determinant with the corresponding Pauli-Villars fields can be written as 
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Where ffl a {x) denotes rational approximation of x a and each determinant term is evaluated by 
separate pseudofermions. Omelyan integrator[||] with X = 0.22 is used in each level of multiscale 
integrators with N step = 16, At Ug ht ■ & heavy : At gauge = 1/8 : 1/8 : 1/48. 

The suppression of force from light quarks from Hasenbusch preconditioning allows us to have 
the light quark have the biggest step among different terms (the nature of higher-order integrator 
such as Omelyan effectively makes A?/ iem , v half of ktu g ht), decreasing the computational cost signif- 
icantly. Also, we decided to simulate with trajectory length % = 2 to make configurations possibly 
decorrelate more effectively. 

The combination of higher-order, multiscale integrators and (rational) quotient terms makes 
the evolution program a heavily nested one. One way to describe this is 

T = 2 = 6MInv+\CG+ [[12GF + [3MInv + 2RF] x 3] x 2 + 12GF + ICG + HF] 
x32 + [\2GF + [3MInv + 2RF] x 3] x 2 + 12GF + 6MInv + ICG 

Where MInv,CG,GF,RF, and HF denote multimass solver for rational quotient terms, inverter 
for preconditioned light quarks, gauge force, rational quotient fermion force and quotient fermion 
force respectively. Expanding the expression without changing the order or terms gives all the 
computational routine in an MD trajectory in order. 

Algorithm described above is implemented and fully optimized for QCDOC in Columbia 
Physics System(CPS[]9|]). All of the production runs are done on 4096-node QCDOC partitions 
at Brookhaven National Laboratory and another 4K partition at Edinburgh Parallel Computing 
Center. Each partitions are running at 400MHz, which gives 800MFlops/s peak per processor. 

Table ^| shows performances of each routines in the 24 3 and 32 3 DWF evolution. The mul- 
timass solver for rational quotient part of the action (MInv) is the dominating part, especially for 
relatively heavy light quarks [m s /mi < 4). While the large number of nodes in each partition and a 
feature of CPS which allows only even number of sites on each nodes makes it necessary to split the 
5th dimension and make strictly 4 dimensional routines such as gauge force duplicate calculation 
along the 5th dimension in some cases, the effect is at the level of a percent of the total time. 

The sum of time on individual routines are slightly less than the total time (~ 5% of the 
total time for 32 3 ensembles). The most of the descrepancy is from the eigenvalue measurement 
routines which are run at the time of each Metropolis step to check if the eigenvalues of DWF dirac 
operators are witin range of the rational quotient approximation. While the performance of the 
routine is expected to be close to that of inverters, it was not measured and we did not include the 
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Table 2: Performance of computation routines in DWF RHMC on QCDOC. Bold numbers denotes 4- 
dimensional routines which are duplicated along 5th dimension when the 5th dimension is split. 



flops for the numbers included in the table. As a result, the overall performance slightly less than 
200MFlops/s per processor, 25% of the peak. A more detailed analysis of mass scaling of each 
routines can be found in [10]. 



3. Basic measurements 

Figure [I] shows the evolution of the plaquette and the chiral condensate. The time series 
analysis of these quantities show they have the autocorrelation time of 7-14 MD units, which is 
smaller than what is reported in [^] from meson correlators. Measurements of meson correlators 
over more configurations than what is available is needed to compare how effectively the RHMC 
algorithm is generating decorrelated lattice configurations. 

Figures ^ and [| show the preliminary result of the residual masses and various hadron masses 
measurements. Measurements were done on 30 m; = 0.004 lattices from MD trajectory length 300- 
590 with gauge fixed box sources with size 20, placed at t = and t = 32. This was done mostly 
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DWF Plaquette Evolution 

32 3 64, m =0.03 




DWF Chiral condensate Evolution 

32 3 64, m, = 0.004. m =0.03 
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Figure 1: Evolution of the average plaquette and the chiral condensate for j3 = 2.25, 32 3 x 64 x 16 ensem- 
bles. Average plaquette {P)(m, = 0.004) = 0.615574(13) and (P)(m, = 0.006) = 0.615591(9). <\jry> 
shown here is from m/ = 0.004 ensemble. 



to ensure the lattice spacing and residual masses are within estimated range and will be measured 
again with the sources we will use for other measurements. Chiral extrapolation is not attempted 
as we measurements on only one dynamical mass. 

Residual mass is measured by fitting R(t), a ratio of pseudoscalar and mid-point correlator 
defined as 

<M ? (*,;K(o)> 



R(t) 



(l,/?(x,0^(0)) 



to a constant between t = 6 and 32. (J" 5 5q ^{x,t) are defined in [Q].) While the uncorrected error 
may be an underestimate of the real error, it shows the residual mass is ~ 6 x 10~ 4 in lattice units or 
~ 1.3Mev. Similarly, fitting meson effective masses gives a ~ 2.2Gev. A separate measurement 
of lattice spacing from the heavy quark potential is in progress. 



4. Conclusions and Discussions 



RBC, UKQCD and LHPC joint collaborations are generating dynamical DWF ensembles with 
a smaller lattice spacing than which are currently available. These ensembles will reduce the sys- 
tematic error in continuum extrapolation of many important physics quantities. A preliminary 
measurements suggests m res ~ l/500m s ,a _1 ~ 2.2Gev and the errors from residual chiral symme- 
try breaking are expected to be ~ 10~ 4 for and ~ 2 % for e'/e, according to the estimate in 

& 

While recent advances in HMC algorithms made gauge configuration generation relatively 
inexpensive, measurements with multiple valence masses still require significant computational 
resources. We are currently working to choose the source which will give optimal overlaps with 
hadron states we are interested in studying. Also, we are studying various deflation techniques 
proposed recently. Jl3|] 
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Figure 2: R(t) for different valence quark masses and the pseudoscalar meson effective mass on j3 = 
2.25, 32 3 x 64 x 16, m/ = 0.004 ensemble. Quoted error for R(t) and the mass is from uncorrected fits, 
necessary due to long plateaus. 



. Vector mass Nucleoli mass 

32 x 64 m s - 0.03 = 0.004 32 x 64 m s - 0.03 = 0.004 




Figure 3: The vector meson and nucleon effective masses on m/ = 0.004 ensemble. Error bars are from 
correlated fits with % 2 /d.o.f ~ 1. 
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